Search CORE

36 research outputs found

Multitask and transfer learning for multi-aspect data

Author: Romera Paredes B
Publication venue: UCL (University College London)
Publication date: 28/12/2014
Field of study

Supervised learning aims to learn functional relationships between inputs and outputs. Multitask learning tackles supervised learning tasks by performing them simultaneously to exploit commonalities between them. In this thesis, we focus on the problem of eliminating negative transfer in order to achieve better performance in multitask learning. We start by considering a general scenario in which the relationship between tasks is unknown. We then narrow our analysis to the case where data are characterised by a combination of underlying aspects, e.g., a dataset of images of faces, where each face is determined by a person's facial structure, the emotion being expressed, and the lighting conditions. In machine learning there have been numerous efforts based on multilinear models to decouple these aspects but these have primarily used techniques from the field of unsupervised learning. In this thesis we take inspiration from these approaches and hypothesize that supervised learning methods can also benefit from exploiting these aspects. The contributions of this thesis are as follows: 1. A multitask learning and transfer learning method that avoids negative transfer when there is no prescribed information about the relationships between tasks. 2. A multitask learning approach that takes advantage of a lack of overlapping features between known groups of tasks associated with different aspects. 3. A framework which extends multitask learning using multilinear algebra, with the aim of learning tasks associated with a combination of elements from different aspects. 4. A novel convex relaxation approach that can be applied both to the suggested framework and more generally to any tensor recovery problem. Through theoretical validation and experiments on both synthetic and real-world datasets, we show that the proposed approaches allow fast and reliable inferences. Furthermore, when performing learning tasks on an aspect of interest, accounting for secondary aspects leads to significantly more accurate results than using traditional approaches

UCL Discovery

Predicting Future Instance Segmentation by Forecasting Convolutional Features

Author: A Yang
B Romera-Paredes
J Walker
KM Kitani
PO Pinheiro
R Sutton
T Lan
T-Y Lin
Publication venue
Publication date: 08/09/2018
Field of study

Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects. To deal with a varying number of output labels per image, we develop a predictive model in the space of fixed-sized convolutional features of the Mask R-CNN instance segmentation model. We apply the "detection head'" of Mask R-CNN on the predicted features to produce the instance segmentation of future frames. Experiments show that this approach significantly improves over strong baselines based on optical flow and repurposed instance segmentation architectures

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Emotion Recognition by Two View SVM_2K Classifier on Dynamic Facial Expression Features

Author: Bianchi-Berthouze N
Meng H
Romera-Paredes B
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/03/2011
Field of study

A novel emotion recognition system has been proposed for classifying facial expression in videos. Firstly, two types of basic facial appearance descriptors were extracted. The first type of descriptor, called Motion History Histogram (MHH), was used to detect temporal changes of each pixels of the face. The second type of descriptor, called Histogram of Local Binary Patterns (LBP), was applied to each frame of the video and was used to capture local textural patterns. Secondly, based on these two basic types of descriptors, two new dynamic facial expression features called MHH_EOH and LBP_MCF were proposed. These two features incorporate both dynamic and local information. Finally, the Two View SVK_2K classifier was built to integrate these two dynamic features in an efficient way. The experimental results showed that this method outperformed the baseline results set by the FERA'11 challenge

UCL Discovery

Zero-Shot Hashing via Transferring Supervised Knowledge

Author: Frome A.
Gionis A.
Huang E. H.
Jayaraman D.
Kang W.-C.
Krizhevsky A.
Larochelle H.
Liu W.
Norouzi M.
Petrović S.
Romera-Paredes B.
Socher R.
Turian J.
Weiss Y.
Wen Z.
Wu T. T.
Xia R.
Zhang H.
Publication venue
Publication date: 01/01/2016
Field of study

Hashing has shown its efficiency and effectiveness in facilitating large-scale multimedia applications. Supervised knowledge e.g. semantic labels or pair-wise relationship) associated to data is capable of significantly improving the quality of hash codes and hash functions. However, confronted with the rapid growth of newly-emerging concepts and multimedia data on the Web, existing supervised hashing approaches may easily suffer from the scarcity and validity of supervised information due to the expensive cost of manual labelling. In this paper, we propose a novel hashing scheme, termed \emph{zero-shot hashing} (ZSH), which compresses images of "unseen" categories to binary codes with hash functions learned from limited training data of "seen" categories. Specifically, we project independent data labels i.e. 0/1-form label vectors) into semantic embedding space, where semantic relationships among all the labels can be precisely characterized and thus seen supervised knowledge can be transferred to unseen classes. Moreover, in order to cope with the semantic shift problem, we rotate the embedded space to more suitably align the embedded semantics with the low-level visual feature space, thereby alleviating the influence of semantic gap. In the meantime, to exert positive effects on learning high-quality hash functions, we further propose to preserve local structural property and discrete nature in binary codes. Besides, we develop an efficient alternating algorithm to solve the ZSH model. Extensive experiments conducted on various real-life datasets show the superior zero-shot image retrieval performance of ZSH as compared to several state-of-the-art hashing methods.Comment: 11 page

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

Tensor completion in hierarchical tensor representations

Compressed sensing extends from the recovery of sparse vectors from undersampled measurements via efficient algorithms to the recovery of matrices of low rank from incomplete information. Here we consider a further extension to the reconstruction of tensors of low multi-linear rank in recently introduced hierarchical tensor formats from a small number of measurements. Hierarchical tensors are a flexible generalization of the well-known Tucker representation, which have the advantage that the number of degrees of freedom of a low rank tensor does not scale exponentially with the order of the tensor. While corresponding tensor decompositions can be computed efficiently via successive applications of (matrix) singular value decompositions, some important properties of the singular value decomposition do not extend from the matrix to the tensor case. This results in major computational and theoretical difficulties in designing and analyzing algorithms for low rank tensor recovery. For instance, a canonical analogue of the tensor nuclear norm is NP-hard to compute in general, which is in stark contrast to the matrix case. In this book chapter we consider versions of iterative hard thresholding schemes adapted to hierarchical tensor formats. A variant builds on methods from Riemannian optimization and uses a retraction mapping from the tangent space of the manifold of low rank tensors back to this manifold. We provide first partial convergence results based on a tensor version of the restricted isometry property (TRIP) of the measurement map. Moreover, an estimate of the number of measurements is provided that ensures the TRIP of a given tensor rank with high probability for Gaussian measurement maps.Comment: revised version, to be published in Compressed Sensing and Its Applications (edited by H. Boche, R. Calderbank, G. Kutyniok, J. Vybiral

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Clinically Applicable Segmentation of Head and Neck Anatomy for Radiotherapy: Deep Learning Algorithm Development and Validation Study

Author: Askham H
Back T
Blackwell S
Boon C
Carnell D
Chu C
D'Souza D
De Fauw J
Fuller K
Garie B
Hampton K
Hughes CO
Ireland S
Karthikesalingam A
Kelly C
Ledsam JR
Livne M
McQuinlan Y
Mendes R
Meyer C
Moinuddin SA
Montgomery H
Nikolov S
Patel Y
Rees G
Romera-Paredes B
Ronneberger O
Suleyman M
Zverovitch A
Publication venue
Publication date: 01/07/2021
Field of study

BACKGROUND: Over half a million individuals are diagnosed with head and neck cancer each year globally. Radiotherapy is an important curative treatment for this disease, but it requires manual time to delineate radiosensitive organs at risk. This planning process can delay treatment while also introducing interoperator variability, resulting in downstream radiation dose differences. Although auto-segmentation algorithms offer a potentially time-saving solution, the challenges in defining, quantifying, and achieving expert performance remain. OBJECTIVE: Adopting a deep learning approach, we aim to demonstrate a 3D U-Net architecture that achieves expert-level performance in delineating 21 distinct head and neck organs at risk commonly segmented in clinical practice. METHODS: The model was trained on a data set of 663 deidentified computed tomography scans acquired in routine clinical practice and with both segmentations taken from clinical practice and segmentations created by experienced radiographers as part of this research, all in accordance with consensus organ at risk definitions. RESULTS: We demonstrated the model's clinical applicability by assessing its performance on a test set of 21 computed tomography scans from clinical practice, each with 21 organs at risk segmented by 2 independent experts. We also introduced surface Dice similarity coefficient, a new metric for the comparison of organ delineation, to quantify the deviation between organ at risk surface contours rather than volumes, better reflecting the clinical task of correcting errors in automated organ segmentations. The model's generalizability was then demonstrated on 2 distinct open-source data sets, reflecting different centers and countries to model training. CONCLUSIONS: Deep learning is an effective and clinically applicable technique for the segmentation of the head and neck anatomy for radiotherapy. With appropriate validation studies and regulatory approvals, this system could improve the efficiency, consistency, and safety of radiotherapy pathways

UCL Discovery

Clinically applicable deep learning for diagnosis and referral in retinal disease

Author: A Esteva
Adnan Tufail
AG Roy
Alan Karthikesalingam
AR Rudnicka
B Foot
Balaji Lakshminarayanan
Bernardino Romera-Paredes
Brendan O’Donoghue
Catherine Egan
CG Owen
Clemens Meyer
CS Lee
CS Lee
Cían O. Hughes
D Castelvecchi
D Huang
Daniel Visentin
Dawn A. Sim
Demis Hassabis
Dominic King
E Villani
FA Folgar
Faith Mackinder
George van den Driessche
Geraint Rees
Harry Askham
Hugh Montgomery
J Fauw De
J Schindelin
JC Buchan
JD Whited
Jeffrey De Fauw
Joseph R. Ledsam
Julian Hughes
Julien Cornebise
Kareem Ayoub
KB Schaal
L Arias
L Fang
Mustafa Suleyman
Nenad Tomasev
Olaf Ronneberger
PA Keane
PA Keane
Pearse A. Keane
Peng T. Khaw
PP Srinivasan
PS Muether
R Chopra
Reena Chopra
Rosalind Raine
RRA Bourne
S Farsiu
Sam Blackwell
Simon Bouton
SPK Karri
Stanislav Nikolov
T Schlegl
Trevor Back
U Schmidt-Erfurth
U Schmidt-Erfurth
U Schmidt-Erfurth
V Gulshan
Xavier Glorot
Publication venue
Publication date: 13/08/2018
Field of study

The volume and complexity of diagnostic imaging is increasing at a pace faster than the availability of human expertise to interpret it. Artificial intelligence has shown great promise in classifying two-dimensional photographs of some common diseases and typically relies on databases of millions of annotated images. Until now, the challenge of reaching the performance of expert clinicians in a real-world clinical pathway with three-dimensional diagnostic scans has remained unsolved. Here, we apply a novel deep learning architecture to a clinically heterogeneous set of three-dimensional optical coherence tomography scans from patients referred to a major eye hospital. We demonstrate performance in making a referral recommendation that reaches or exceeds that of experts on a range of sight-threatening retinal diseases after training on only 14,884 scans. Moreover, we demonstrate that the tissue segmentations produced by our architecture act as a device-independent representation; referral accuracy is maintained when using tissue segmentations from a different type of device. Our work removes previous barriers to wider clinical use without prohibitive training data requirements across multiple pathologies in a real-world setting

Crossref

UCL Discovery

Recurrent instance segmentation

Author: Romera-Paredes B
Torr PHS
Publication venue: Springer Verlag
Publication date: 01/01/2016
Field of study

Instance segmentation is the problem of detecting and delineating each distinct object of interest appearing in an image. Current instance segmentation approaches consist of ensembles of modules that are trained independently of each other, thus missing opportunities for joint learning. Here we propose a new instance segmentation paradigm consisting in an end-to-end method that learns how to segment instances sequentially. The model is based on a recurrent neural network that sequentially finds objects and their segmentations one at a time. This net is provided with a spatial memory that keeps track of what pixels have been explained and allows occlusion handling. In order to train the model we designed a principled loss function that accurately represents the properties of the instance segmentation problem. In the experiments carried out, we found that our method outperforms recent approaches on multiple person segmentation, and all state of the art approaches on the Plant Phenotyping dataset for leaf counting

Oxford University Research Archive

A One-Vs-One classifier ensemble with majority voting for activity recognition

Author: Aung M. S.H.
Bianchi-Berthouze N.
Romera-Paredes B.
Publication venue
Publication date: 11/11/2013
Field of study

A solution for the automated recognition of six full body motion activities is proposed. This problem is posed by the release of the Activity Recognition database [1] and forms the basis for a classification competition at the European Symposium on Artificial Neural Networks 2013. The data-set consists of motion characteristics of thirty subjects captured using a single device delivering accelerometric and gyroscopic data. Included in the released data-set are 561 processed features in both the time and frequency domains. The proposed recognition framework consists of an ensemble of linear support vector machines each trained to discriminate a single motion activity against another single activity. A majority voting rule is used to determine the final outcome. For comparison, a six "winner take all" multiclass support vector machine ensemble and k-Nearest Neighbour models were also implemented. Results show that the system accuracy for the one versus one ensemble is 96.4% for the competition test set. Similarly, the multiclass SVM ensemble and k-Nearest Neighbour returned accuracies of 93.7% and 90.6% respectively. The outcomes of the one versus one method were submitted to the competition resulting in the winning solution

University of East Anglia digital repository

Pooling Objects for Recognizing Scenes without Examples

Author: Jayaraman D.
Mikolov T.
Norouzi M.
Romera-Paredes B.
Zhou B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

In this paper we aim to recognize scenes in images without using any scene images as training data. Different from attribute based approaches, we do not carefully select the training classes to match the unseen scene classes. Instead, we propose a pooling over ten thousand of off-the-shelf object classifiers. To steer the knowledge transfer between objects and scenes we learn a semantic embedding with the aid of a large social multimedia corpus. Our key contributions are: we are the first to investigate pooling over ten thousand object classifiers to recognize scenes without examples; we explore the ontological hierarchy of objects and analyze the influence of object classifiers from different hierarchy levels; we exploit object positions in scene images and we demonstrate new scene retrieval scenarios with complex queries. Finally, we outperform attribute representations on two challenging scene datasets, SUNAttributes and Places2

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications